feat(cli): quantcpp client (SSE streaming) + serve discoverability#47
Merged
feat(cli): quantcpp client (SSE streaming) + serve discoverability#47
Conversation
The HTTP server already supported OpenAI-compatible SSE streaming (controlled by `"stream": true` in the request body) but it wasn't discoverable from the CLI. This PR makes it explicit and easy to use. New: `quantcpp client PROMPT [--url ...] [--no-stream]` - Sends a chat completion to a running quantcpp serve endpoint - Default mode is streaming (SSE) — tokens print as they arrive - --no-stream falls back to a single JSON response - Stdlib only (urllib) — no extra dependencies Improved: `quantcpp serve` startup output - Now prints all three endpoints (chat/completions, models, health) - Shows curl examples for both streaming and non-streaming modes - Shows OpenAI Python SDK snippet for drop-in usage Verified end-to-end: server streams token-by-token; client decodes SSE chunks correctly; --no-stream returns single JSON. README (EN/KO) and guide CTA updated to mention `quantcpp client` and the streaming/non-streaming choice. Version: 0.12.0 → 0.12.1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The HTTP server already supported OpenAI-style SSE streaming via `"stream": true`, but it wasn't easy to discover or test from the CLI.
New: `quantcpp client PROMPT`
```bash
quantcpp serve llama3.2:1b -p 8080 # in one terminal
quantcpp client "What is gravity?" # in another — streams tokens via SSE
quantcpp client "Hi" --no-stream # single JSON response
quantcpp client "Hi" --url http://other:8081
```
Improved: `quantcpp serve` startup
Now prints all endpoints, curl examples for both streaming and non-streaming modes, and the OpenAI Python SDK snippet.
Verified
Version 0.12.0 → 0.12.1.